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Abstract 

We study the problem of generating a sample from the stationary distribution of a Markov 
chain, given a method to simulate the chain. We give an approximation algorithm for the case 
of a random walk on a regular graph with n vertices that runs in expected time 0*(v / nr m i x ), 
where r m i x is the mixing time of the chain in L 2 . This is close to the best possible, since \fri is 
a lower bound on the worst-case expected running time of any algorithm. 
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1 Introduction 

Often the only feasible method for sampling from a complex distribution is to simulate a suitably 
chosen Markov chain for sufficiently many steps. Procedures based on this idea, called Markov chain 
Monte Carlo, have been applied to a number of problems such as approximating the permanent 
[7], computing volumes [8] and integrals [9], and approximate counting |10j . In order to find out 
how many steps one needs to simulate the chain, it is necessary to determine the mixing time, i.e., 
the number of steps necessary to bring the distribution close to stationary. This analysis is often 
complex and has to be tailored to the specific type of Markov chain under consideration. 

Another, related line of research has been pursued (see pT2J for background and see also [3|[Tllllj): 
the Markov chain is generic, its transition probabilities are not known, and the algorithm is given 
a procedure to generate the next state of the chain based on the current state. Since the chain 
is arbitrary, the algorithm cannot have any advanced knowledge of the mixing time. Aldous p], 
comes by time 0(rPoly(l/e)) to within e from the stationary distribution in total variation. This 
was improved by the "cycle popping" algorithm invented by Propp and Wilson [12], which runs in 
expected time 0(t), where r is the mean hitting time (expected number of steps the chain takes to 
get from X to Y if X and Y are chosen independently according to the stationary distribution). In 
|llj a random stopping rule for exact sampling from an unknown Markov chain is given where the 
expected number of steps is 6r 4 . It is easy to see that any algorithm for an arbitrary n-state chain 
must have running time at least n: the algorithm must visit every state, since an unvisited state x 
could potentially have a holding probability p(x, x) very close to 1 and hence a very high stationary 
probability. In this note, we show that if the chain is a random walk on a regular graph then 
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there is an approximation algorithm whose running time can be much less. Our algorithm runs in 
expected time 0*(y/nr m i x ), where r mix is the I? mixing time of the chain; this can be much smaller 
than n. We note that our algorithm produces a sample with some error (i.e., the distribution is not 
exactly stationary, although it can be made arbitrarily close), whereas the algorithms described in 
\\.2\ [TTl [3] produce an exact sample. In a somewhat similar spirit of observing a random walk on 
an unknown graph, [4] studies what can be learned by knowing polynomially many return times to 
a fixed vertex of a simple random walk on a regular graph. 

Our analysis is based on a variation of the standard birthday problem. Roughly speaking, in 
a world where there are n possible birthdays, the number of people that you need to pick to be 
confident that at least two have the same birthday is of the order y/n. It turns out that in a similar 
vein, if order y/n copies of a Markov chain with uniformly stationary distribution are run for much 
less than the mixing time then there is likely to be a match, whereas if they are run for much more 
than the mixing time there is a good chance for no match. (See Section [2] for a precise formulation 
of this.) This forms the basis for our algorithm. A similar idea was used by Goldreich and Ron [5], 
as a suggestion for a possible sublinear tester for expansion. 

2 Results 

2.1 The Problem 

The algorithm is given an irreducible, aperiodic n state Markov chain as input. More precisely, 
the algorithm is given the number of states n, a starting state xq, and a procedure nextstateO, 
which, given a state x of the chain outputs state y which is one step of the chain starting from x. 
The problem is to generate a random state according to the stationary distribution. 

We will aim for an e-approximation algorithm, that is, an algorithm that generates a random 
state within total variation distance e of the stationary distribution. 

2.2 Main Theorem 

Let p m ( • , •) be transition probabilities for an irreducible, aperiodic, doubly stochastic n-state 
Markov chain on state space V. Let U denote the uniform distribution over V and for functions [i 

on V, let \\fi\\2 = (j2 X £V «/ u ( x ) 2 ) denote the norm of fi in L 2 (U). For e > 0, let r(e) = min{n : 
\\np n (xQ, •) — 1 1 1 2 < e }- Denote the mixing time in 1? by 



Our main result is the following theorem. 

Theorem 1 Suppose that the Markov chain is a random walk on a regular, connected graph with 
degree at most n. Then there is an algorithm that returns a sample within total variation distance 
e in expected time 0{\fn\og(n/e)T ra \ x ). 

Remark: To obtain a lower bound for the running time, we can consider random walk on the 
complete graph. If an algorithm simulates the chain for less than order y/n steps, it is likely to 
see only distinct states of the chain, hence it couldn't "tell the difference" between the chain and 
random walk on two complete graphs of size n/2 joined by a single edge. (Note that in the second 
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case one would have to do at least order n simulated steps to get an almost uniform sample.) So 
any algorithm needs at least order sfn steps, which is order y/nr^^ steps since the mixing time is 
a constant. We believe that for analogous reasons this would still hold (i.e., any algorithm would 
need order ^/n~T m i x steps) when the complete graph is replaced by a random d-regular graph, but 
it seems harder to prove this. 



Proof: 

The algorithm is as follows. Define A n = n 4 log(2n/e) and I = \1 + 512 v^ ~] . Iterate the following 
procedure for i G {1, 2, . . .} until stopped. 

Let m = 81og(2vl n /e), and perform the following experiment m times. Simulate / copies of the 
Markov chain starting at xq for 2 % steps, generating I samples X\, . . . ,X[. To avoid the possibility 
that the Markov chain has an eigenvalue close to —1, we implement a holding probability of 1/n 
to each state; i.e., each step we do nothing with probability 1/n, otherwise simulate a step of the 
chain. This doesn't change the stationary distribution and ensures that all eigenvalues are at least 
-1 + §. Let 5 = e 2 . Let 

~ l(X k = X m ), (1) 



E 

k<m 



and if Z < (1 + 5/2){ l A^ then count the experiment as a success; otherwise count it as a failure. 
If at least m/2 of the experiments are successful, or if i = |~log 2 A n ~\ then stop; otherwise continue 
with the next value of i. 

Let i' be the value of i when the above procedure terminates. We claim that with high prob- 
ability after 2* steps the chain is very mixed. Hence the algorithm can run another independent 
simulation of the chain for 2* steps and the result is an almost uniform sample from the state space. 

More precisely, let = p 2 \xq,-). We will show that with probability at least 1 — e/2, the 
value of i' is large enough so that \\nfj,(i') — 1 1 1| < 5. 

Analysis of the algorithm. By a conductance bound (see, e.g., [6]), the spectral gap for the 
chain must be at least 1/n 4 , and hence r(<5/2) < n 4 log(2n/5) = A n . Thus if i' = |~log 2 A n ~] then 
2 l > A n , which implies that \\nfx(i') — 1| || < 5/2. 

Next we have to bound the probability that the algorithm stops early on a value of i such 
that ||nu(i) — 1| || > 5/2. Fix i < |~log 2 A n ~\ and let p x be the probability that the chain is at x 
after 2 l steps. Cauchy-Schwarz gives J2 x p 2 > 1/n. It follows that if Z is defined as in (JTJ) then 
E(£) = &E x pl > and hence 

1 5 2 

< FTo ■ (2) 



We also have 



E(Z) - 512 

2y^ 



I ^5T2- (3) 

Note that E(Z) = Q±(l + \\n^(i) - Thus if \\nn(i) - > 5/2, then E(Z) > Q±(l + 5/2) 

and hence 

p(z<Ql(l + ,/2)) < P (Z < E (Z)(l±f)) ,4) 
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< 



< 



P(Z < E(Z)(l 
var(Z) 



«/*)) 



16 



E(Z)S 



< 165" 



1 + ^E(Z) 



E(Z) 



(5) 
(6) 

(7) 



where the second inequality follows from the fact that 



1+8/2 



< 1 



1+s x — 5/4 whenever 5 < 1, the third 
line is Chebyshev's inequality, and the fourth line is Lemma [2] from the Appendix. The upper 
bound ([7]) , and hence the probability of success is at most 
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8 2 VE(Z) I J - 4 



18, 



Thus Hoeffding's bounds imply that the probability of at least m/2 successes in stage i is at 
most e" m / 8 < e/2 A n . Since the number of stages can never be more than A n , the probability 
that i' is such that ||n/x(i') — 1 1 1 1 — $ 1S & t m ost e/2. Recall that 8 = e 2 and note that the to- 



tal variation distance — U\\tv = 2||n//(i') — l||i. Hence, when \\nfi(i') — 1| I2 — ^ we have 

||n/i(i') — l||i < \\n/i(i') — 1| I2 < 5 and hence the total variation distance \\n(i') — U\\tv — e /2- 
It follows that the algorithm will generate a random sample within total variation distance e of 
uniform. 



Running time. Define T^ ix = t(<5/4) and let i* = min{i : 2 % > r^ lix }. If % < i* then the number 
of steps in stage i is at most C(12 % log(2A n /e) for a universal constant C. (We assume that the 
values of the chain are given as 0-1 strings, whose lengths we treat as constant. If we store the 
values of X±, . . . ,Xi in a binary tree, then we can count the number of matches among them in 
0(1) time.) Summing this over i < i* shows that the number of steps corresponding to i < i* 
is 0(l2 i *log(2A n /e)) = 0(W' m ^\og(2A n /e)). Suppose that % > i* . Then 2 i > r^ x and hence 
— U\vi — 6/ A- Step i + 1 occurs only if there are more than m/2 failures in step i. Note that 



E(Z) < + 8 /4) and hence 



*>({)= 



1 + 6/2)) < P [Z>E(Z) 



1 + 6/2 



1 + 6/ A 
< p(z>E(Z)(l + 
var(Z) 



< 



64 1 + h^E(Z) x 

- w — ^ 



E(Z) 



(8) 
(9) 
(10) 

(11) 



where the third line is Chebyshev's inequality and the second line uses the fact that \^Jj\ > 1 + 5/8 
whenever 8 < 1. Thus Hoeffding's bounds give P (stage i + 1 occurs) < P(more than m/2 failures) < 
g-m/8 < e /2A n . Since the maximum number of steps in any stage is 0(A n llog(2A n /e)), summing 
the above bound over i gives an 0{l log(2^4 n /e)) bound. Adding everything up gives a total expected 
running time of 0{lT^ ix log(2A n /e)) = 0(Zr mix log(n/e)). □ 
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Remarks: 



• By running y/n samples after the mixing time is estimated instead of just one, the algorithm 
could actually produce y/n samples and the expected running time would still be 0*(y / nr m i x ). 

• The assumption that the degree is at most n can be relaxed; it is only used to get a poly(n) 
upper bound for the mixing time (in order to bound A n ). 

• In [2] it was observed that a non-backtracking random walk mixes (up to a factor two) faster. 
Thus, in the setting where the algorithm can determine the set of neighbors of a state x, one 
can very slightly reduce the randomness used, as well as the running time, by replacing the 
simple random walk with a non-backtracking random walk. Also simulating the I copies of 
the Markov chain at each stage, can of course be done in parallel. 

3 Appendix 

The following bound on the variance of Z was needed. 

Lemma 2 Let X be a random variable taking values in {l,...,n}, and let pi = P(X = i) for 
1 < i < n. Let X\, . . . ,Xi be independent copies of X and let Z = Y^,i<j = Xj)- Then 



Proof: Part 1 is obvious. For part 2, let M = maxjj>j. Clearly, J2iPi — M 2 , and Cauchy-Schwarz 
gives Y,iPi > ^- Hence 



For 1 < % < j < n, let Z{j = l(Xi = Xj), so that Z = J2i<j Note that cov(Zij, Z^j) = if 
i,j,k,l are distinct. Thus, 



1. E(£) = pf. 





It follows that 




(12) 




i+3 
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Combining this with equation (|12p and part 1 of the lemma yields 



var(Z) < 




E(Z)(l + ^E(Z)) 




< 



E(Z)(l + ^E(Z)), 



establishing part 2 of the lemma. 



□ 
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